Project Evaluation

The initial project proposal in Unit 6 set out a clear objective: to design and implement a robust, normalized relational database for ADR Logistics, a transportation company facing challenges with fragmented vehicle and maintenance data. The proposal highlighted the need for a single source of truth for vehicle status, maintenance records, trip information, and real-time sensor data. Our intent was to move ADR Logistics from scattered spreadsheets and ad hoc data storage toward a structured database system that would support timely insights, reporting, and long-term operational scalability. The proposal outlined the use of a simulated logistics maintenance dataset from Kaggle to inform the design and stressed the importance of normalization, referential integrity, and user-focused table relationships.

The final deliverable in Unit 11 remained closely aligned with this vision, but the process of implementing the database introduced new complexities and valuable lessons. Our team successfully developed a normalized data model based on the four major entities identified in the proposal: Vehicle, Maintenance, Trip_Info, and Sensor_Data. Using SQL, we built dedicated tables for each entity, ensuring that primary keys uniquely identified each record and that foreign key constraints enforced referential integrity. The result was a database in Third Normal Form, which minimized redundancy and supported efficient, reliable queries.

A major area where the final project went beyond the initial proposal was in the depth of the data cleaning and preparation pipeline. While the proposal noted the importance of data quality, the final project required hands-on work in Python using pandas to explore, validate, and standardize the dataset. We handled missing values, standardized date and text formats, and validated numeric fields, ensuring that the data imported into SQL would maintain consistency. The pipeline included not only technical cleaning steps but also the logical mapping of variables from the raw dataset to appropriate SQL data types such as INT for IDs, VARCHAR for descriptive fields, FLOAT for measurements, DATE for time-stamped records, and BOOLEAN for status flags. This careful preparation was crucial for the operational success of the database, as even minor inconsistencies could have caused referential errors or inaccurate reporting.

The initial proposal touched on the need for security and compliance, but in the final project, we gave these aspects more detailed treatment. In particular, we addressed GDPR requirements, which, while not legally mandated for a North American logistics company, represent best practices in data management and position ADR Logistics for future growth or international expansion. The final system incorporated role-based access control, data encryption both at rest and in transit, explicit consent tracking, and audit logs for sensitive data changes. We also planned for efficient and secure data deletion to support the right to be forgotten and established clear procedures for breach notification and response. This approach not only mitigated risks around unauthorized access and data loss but also increased client trust in the integrity of the system.

Our team also reflected on database management system selection. The proposal initially assumed the use of SQL, but during implementation, we compared SQL and NoSQL approaches more closely. While NoSQL offers flexibility for unstructured or rapidly changing data, our analysis found that ADR Logistics’ needs for precise relationships, strong consistency, and complex queries were best served by a relational SQL database. This evaluation reinforced the importance of technology fit and provided practical experience in justifying technology choices in a real business context.

Professionally, working on this project advanced my skills in several important areas. I learned to translate business requirements into technical specifications, using entity-relationship diagrams and normalization techniques to create a model that could be directly implemented. The experience of cleaning and structuring big data, especially sensor and trip records which often arrive with missing or inconsistent fields, gave me firsthand insight into the challenges of preparing operational data for analytics. I also became more familiar with industry standards for compliance and security, which are essential in the modern transportation sector as regulatory pressure increases and data privacy becomes a key concern.

On a personal level, the team-based approach strengthened my communication and collaboration abilities. Balancing diverse perspectives, from Python data cleaning to SQL schema design and documentation, taught me the value of integrating different strengths to achieve a common goal. Early stakeholder engagement and regular check-ins helped ensure that the evolving design remained aligned with operational needs. These lessons are directly applicable in my current work with big data in the transportation industry, where project success depends not just on technical knowledge but also on clear coordination between technical teams and business stakeholders.

In summary, the final project built on and exceeded the initial proposal, delivering a secure, compliant, and efficient database system tailored to ADR Logistics’ requirements. The hands-on experience with the full data pipeline, from raw data to operational database, has significantly contributed to my development as a data science student and as a professional working in big data for transportation.